Load Balancing

When NGINX acts as a reverse proxy, it can distribute incoming requests across multiple backend servers to achieve:

High availability
Better performance
Scalability
Fault tolerance

This is done using an upstream block.

Basic Upstream Configuration

upstream app_backend {
    server 10.0.0.11:8080;
    server 10.0.0.12:8080;
    server 10.0.0.13:8080;
}

By default, NGINX uses round-robin.

Round-Robin Load Balancing (Default)

Requests are distributed sequentially across backend servers:

Request 1 → Server A
Request 2 → Server B
Request 3 → Server C
Request 4 → Server A

Each request goes to the next server in order.

Example Configuration

upstream app_backend {
    server 10.0.0.11:8080;
    server 10.0.0.12:8080;
}

server {
    listen 80;

    location / {
        proxy_pass http://app_backend;
    }
}

First request → 10.0.0.11
Second request → 10.0.0.12
Third request → 10.0.0.11
Even distribution over time

Weighted Round-Robin

upstream app_backend {
    server 10.0.0.11 weight=3;
    server 10.0.0.12 weight=1;
}

Server 10.0.0.11 receives 75% of traffic
Server 10.0.0.12 receives 25%

Pros

Simple
Efficient for similar backends
Default behavior

Cons

Does not consider current load
Not ideal for long-running requests

Least Connections (`least_conn`)

Each new request is sent to the backend with the fewest active connections.

Server A → 10 active connections
Server B → 3 active connections
→ New request goes to Server B

Example Configuration

upstream app_backend {
    least_conn;

    server 10.0.0.11:8080;
    server 10.0.0.12:8080;
}

NGINX tracks active connections
New requests go to least busy server
Excellent for uneven or long-lived requests

Weighted Least Connections

upstream app_backend {
    least_conn;

    server 10.0.0.11 weight=2;
    server 10.0.0.12 weight=1;
}

NGINX factors weight into decision-making.

Pros

Adapts to load
Ideal for slow APIs or streaming
Reduces overload

Cons

Slightly more overhead
Doesn’t track CPU or memory usage

IP Hash (`ip_hash`)

Client IP address is hashed to select a backend server.

Client IP → Hash → Server

Same client IP always maps to the same server (as long as it’s available).

upstream app_backend {
    ip_hash;

    server 10.0.0.11:8080;
    server 10.0.0.12:8080;
}

Client 203.0.113.10 → Server A
Client 203.0.113.10 → Server A again
Enables session persistence (sticky sessions)

Use Case

Applications that store sessions in memory
Legacy systems without shared session storage

Limitations

Uneven distribution with NAT users
Not compatible with weights (fully)
Scaling changes may remap clients

Pros

Simple session persistence
No cookies required

Cons

Poor distribution with many clients behind NAT
Scaling issues

Choosing the Right Method

Scenario	Best Method
Identical backends	Round-robin
Long-running requests	Least connections
In-memory sessions	IP hash
Modern apps	Least_conn + shared sessions

Real-World Production Example

upstream web_backend {
    least_conn;

    server 10.0.0.11:8080 max_fails=3 fail_timeout=30s;
    server 10.0.0.12:8080 max_fails=3 fail_timeout=30s;
}

server {
    listen 80;

    location / {
        proxy_pass http://web_backend;

        proxy_set_header Host $host;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
    }
}

Least-loaded server gets traffic
Failed servers temporarily removed
Headers preserve client identity

Health & Failover Behavior

NGINX:

Marks server as failed after max_fails
Skips it during fail_timeout
Automatically retries healthy servers

Round-Robin Load Balancing (Default)​

Weighted Round-Robin​

Least Connections (least_conn)​

Weighted Least Connections​

IP Hash (ip_hash)​

Choosing the Right Method​

Real-World Production Example​

Health & Failover Behavior​